Research Title
“Utilizing Dynamic Harmonic Regression Models for Forecasting
Complaints: A First-Case Study from the Call Center of Rome Capital,
2021-2022”
Research Type
[MoD | method-over-data]
Abstract
Life in Rome is characterized by chaos and frenetic
activity. The city spans a vast area with limited underground
connections, and it faces challenges such as rural animal invasions,
widespread traffic congestion, air pollution, and significant concerns
regarding garbage collection, locally known as “monnezza”.
This paper examines the organizational structure of efforts to
address these issues, including the roles, number, and responsibilities
of the figures involved. Data collected from various channels,
including reports and complaints submitted to the Citizen’s Digital
Home, Call Center, Help Desk, and Email, by both resident and
non-resident citizens, tourists, and city users, are analyzed to provide
insights influenced by the socio-political structure of Rome and the
diverse objectives of the Call Center.
While initial attempts at clustering may not be straightforward, our
research aims to delve deeper into the data to better understand the
city’s issues and potentially support the efforts of city officials.
This study seeks to answer questions regarding the organization of
efforts and the expected outcomes, including the identification of key
tasks and figures required.
Our primary objective will be prediction. Key
challenges include capturing the seasonality of certain phenomena and
understanding the temporal and socio-economical dynamics and keep in
mind that some issues may not necessarily worsen over time.
Indeed we’ll try forecasting complaints at the municipality
through the utilization of Dynamic Harmonic Regression (DHR) models, as
they allow for the systematic analysis of temporal patterns and trends,
enabling parametric forecasting with enhanced accuracy and efficiency
Main research aim & framework
Municipality
The main goal is to predict where next complains will be and what
will be about. It will be also of our interest to cluster them and to
try to localize them in the municipalities of Rome.
Tasks
We would like to see also how the complains are distributed, if they
are more or less distributed equally during the year or there are
specific time-points where they peak and try to understand the reasons
behind these peaks.
Secondary goals
We need to consider the limited duration of the accessibility of the
data and its discontinuity across certain years.
The Genesis of Our Concept & Relevance
The idea basically originated from personal
experience. Indeed, in Rome, everyone has the experience of calling
060606 for complaints - or thinking to do it, but after the first
attempt - seeing that expectations are highly disappointed - further
future attempts are abandoned. We may want to make a first step into the
direction of properly understand and address the problems and why it is
too hard to solve them. Up to know if you see a fallen tree,
you will think: someone else will take care of it. In August, garbage
overflows: it is normal, as it is known that employees are on vacation
and so on. Also, for the bike lanes introduced with the Raggi
administration almost 5 years ago, how can one not expect complaints
from cyclists because it is still normal to find parked cars or from
drivers because it is not fair to reduce the roads and increase traffic
- they are indeed the greatest experts on how to manage and improve city
traffic: allowing everyone to comfortably use their car. Of course,
don’t you think?! Much more could be said, but we limit ourselves to
this, and it should be clear to you how the idea for this project was
born: we are tireless complainers :) .
Papers (up to now!)
- Tom Vercauteren, Pradeep Aggarwal, Xiaodong Wang (2012) Tree
models for difference and change detection in a complex
environment. The Annals of Applied Statistics, 6,
pp.1286-1297
This article was interesting in the first place as shapes with
differential tree model multiple data sets and aims to detect
distributional differences between them. In particular clustered two
types of fires in Australian countryside which matched the eventually
task clustering in our case - at least refering to desired output.
- Vercauteren Tom and Aggarwal Pradeep and Wang Xiaodong and Li
Ta-Hsin (2007). Hierarchical Forecasting of Web Server Workload
Using Sequential Monte Carlo Training. IEEE Transactions on Signal
Processing, 55, pp. 1162-1184.
This article talks about a forecasting problem and solves it using
DHR (Dynamic Harmonic Regression) and SMC (Sequential Monte Carlo)
methods. We liked this article because the problem is similar to ours
and the methodology is explained step-by-step.
- Rausch, T. M., Albrecht, T., & Baier, D. (2022). Beyond the
beaten paths of forecasting call center arrivals: On the use of dynamic
harmonic regression with predictor variables. Journal of Business
Economics, 92, 675–706. https://doi.org/10.1007/s11573-021-01075-4
This article suggests a new approach to the DHR method. It enables to
make better forecasts by using predictor variables inside the DHR in
order to also capture the effects of contextual factors such as holidays
and reminder e-mails.
Data source(s)
Data is collected and retrieved from the Open Data section of
Roma Capitale, whose website can be accessed here. In
particular, the dataset we have selected contains all cases, including
calls and complaints, which also encompass those related to AMA, the
public corporation responsible for coordinating and managing waste
collection in Rome. These cases are collected through various channels
such as CzRM (the Digital Home of the Citizen), Call Centers, Physical
Complaints, and Email, and they involve both citizens and non-citizens
of Rome, as well as city-users and tourists. The dataset is open
source and can be found here
Potential lackness in data
Regarding the data collection process, as mentioned earlier, we will
utilize previously collected data, thereby accessing it directly.
However, given that this data originates from a new experimental
open-source database of Roma Capitale, it will require preprocessing and
cleaning. We anticipate encountering difficulties in rectifying the
dataset, particularly concerning time dependencies, and will consider
multiple approaches to address this, along with assessing their
performances. Additionally, some feature engineering will be
necessary.
In terms of size, the two datasets, “case open” and “case closed,”
are in .csv format, with dimensions of approximately 800 and 11,300 KB
and 7,050 and 13,000 rows, respectively, per year. Up to now, they can
be easily expanded to cover two years. However, further expansions could
be limiting up to the actual update state (missingness), which we will
possibly avoid supporting the hypothesis that they would not help the
results.
Furthermore, given that the two datasets offer different information,
matching the information temporally may pose challenges. Our
initial approach will primarily focus on the closed dataset, tentatively
exploring the open dataset, and eventually describing the differences we
expect to encounter.
Model & Methods
We plan to use the DHR (Dynamic Harmonic Regression)
method for this project. When we searched the internet and the
articles related to our prediction problem, the articles titled
“Hierarchical Forecasting of Web Server Workload Using Sequential
Monte_Carlo Training” and “Beyond the beaten paths of forecasting call
center arrivals: on the use of dynamic harmonic regression with
predictor variables” caught our attention the most. The DHR method
is useful to make predictions using time series analysis while also
capturing seasonality. In DHR, the main idea is to represent our time
series data in a different way, using sinusoidal functions to be able to
recognize patterns better; and then using a dynamic modelling technique.
Since we are expecting to observe high seasonality in our
dataset, we believe the DHR method will help us capture those
affects. For the dynamic modelling part; we plan to use the
Sequential Monte Carlo Method, as the authors did in the first of the
above-mentioned article, because it is a flexible model especially when
working with non-linear and non-Gaussian data such as ours.
Additionally, we might include some of the predictor variables
(most likely holidays) into our methods to test the method mentioned in
the second article by comparing the results of the classic DHR and DHR
with predictor variables.
Innovation in the topic
The main inherent difficulties of the problem revolve around the
use of a dataset for which we have not found similar structures or
research. Furthermore, knowledge of Rome and its socio-economic
characteristics would be advisable. From this perspective, we rely
on our “Romanity” (p.s: Although not
by blood, Simay feels Roman at heart.), the sources available on
the Roma Capitale website, and various statistics, particularly
concerning the switchboard and the distribution of poverty by
municipalities. This uncertainty and the presence of multiple factors
are among the difficulties we seek to address.
Regarding the temporal horizon, should we succeed in configuring a
predictive model, we are particularly interested in Dynamic Harmonic
Regression (DHR) models. These models offer a dynamic framework for
capturing temporal patterns and seasonality, which aligns well with the
nature of our data and the challenges posed by the problem. By
incorporating DHR models into our predictive analyses, we aim to enhance
the accuracy and robustness of our forecasts, thus providing valuable
insights for decision-makers and stakeholders. This strategic choice
reflects our commitment to adopting advanced methodologies that can
effectively address the complexities inherent in urban socio-economic
systems, ultimately contributing to more informed policy-making and
resource allocation efforts.
IML analysis
We also plan to use IML analysis. We plan to use Cluster
Analysis and Temporal and Geospatial Analysis in order to have a better
picture of the data and in order to create the best prediction model for
new complaints. With respect to Cluster Analysis, we plan to
use tree-based-clustering in order to identify similar patterns or
groups within our data. This clustering analysis uncovers hidden
structures and associations, guiding targeted interventions and
personalized strategies. Furthermore, with geospatial and temporal
analysis, we would like to explore the spatial and temporal distribution
of the complains. We hope to uncover trends, hotspots, and patterns over
time and space, in order to deal also with resource allocation.
References
Portal
Open Data
You can find on Moodle the list of main articles in .bib
file.
Breiman, L., Friedman , J. H., OlLshen , R. A. and Stone , C.B.
(1984). Classification and Regression Trees. Wadsworth,
Belmont
Ta-Hsin Li and Melvin J. Hinich (2002). A Filter Bank
Approach for Modeling and Forecasting Seasonal Patterns.
Technometrics, 44, pp. 1-14
Tom Vercauteren, Pradeep Aggarwal, Xiaodong Wang (2012) Tree
models for difference and change detection in a complex
environment. The Annals of Applied Statistics, 6,
pp.1286-1297
Vercauteren Tom and Aggarwal Pradeep and Wang Xiaodong and Li
Ta-Hsin (2007). Hierarchical Forecasting of Web Server Workload
Using Sequential Monte Carlo Training. IEEE Transactions on Signal
Processing, 55, pp. 1162-1184.
Rausch, T. M., Albrecht, T., & Baier, D. (2022). Beyond
the beaten paths of forecasting call center arrivals: On the use of
dynamic harmonic regression with predictor variables. Journal of
Business Economics, 92, 675–706
Project Timeline
Concerning the timeline, we are almost sure that it will change over
the course of the project but as for now, this would be the plan:
- 3 weeks study the model and pre-process the data;
- 1 week to train the model;
- 1 week for evaluate the results
- 2 week comparing results with other tecniques and conclusion;
Here is a sketch of our timeline: